System and method for video motion processing

ABSTRACT

In motion compensated video processing, a method of combining a plurality of pictures from an input sequence to form an output picture temporarily intermediate two of the input pictures by projecting input pixels to locations on the output picture according to motion vectors assigned to the input pixels, in which the mix of input pixels used to form an output pixel takes into account the number and nature of vectors which point to a given output pixel location from each input picture. In the case where there are a plurality of vectors from one input image pointing to the output pixel location the method may assign a lower weight to input pixels from that input picture, or may make a statistical analysis of the plurality of vectors in determining the output pixel. Alternatively increase weighting may be assigned to input pixels the respective vectors of which form conjugate pairs.

This invention is directed to picture building in motion compensatedvideo processing.

Many contemporary standards conversion and other video processingsystems employ motion compensation in order to Improve the quality ofthe output pictures. In such systems, it is a typical requirement fornew output pictures to be interpolated from original input pictures.Motion compensation assigns motion vectors to the pixels of the inputpictures, and these vectors are used to project the original pixels to“build” the output picture.

It is an object of the present invention to provide techniques forimproving the quality of the output pictures of such systems.

Accordingly, the invention consists in one aspect in a method of motioncompensated combination of two pictures of an input picture sequence toform an output picture at a temporal location between the two Inputpictures, comprising: projecting input pixels from the input pictures tolocations on the output picture using motion vectors assigned to thoseinput pixels; counting the number of vectors from each input picturewhich point to a given pixel location on the output picture; andemploying this count in controlling the mix of the pixels projected bythose vectors used to produce the output pixel at the given pixellocation.

The inventors have thus recognized that counting the number of vector“hits” at a particular output pixel location gives important informationrelating to the quality of the eventual output of the motioncompensation process. Using this count to control the process thereforeresults in significant advances in quality.

Preferably, the method comprises employing a non-linear function of thecount in controlling said mix.

In one form of the invention, the method comprises, where a plurality ofvectors from one of the input pictures point to the given pixellocation, assigning lower weight to the respective pixels of thosevectors from that input picture for construction of the pixel at thegiven location. In another form, the method uses an average of therespective pixels of those vectors as the contribution to the outputpixel from that input picture.

In still another form, the method comprises, where a plurality ofvectors point to the given pixel location, taking a median of thevectors, and using the vector closest to the median for construction ofthe output pixel.

In another aspect, the invention provides a method of motion compensatedcombination of two pictures of an input picture sequence to form anoutput picture at a temporal location between the two input pictures,comprising: projecting input pixels from the input pictures to locationson the to output picture using motion vectors assigned to those inputpixels; and mixing the respective pixels projected by the vectors ontothe output picture to produce an output pixel at a given location,wherein, where a plurality of vectors from one of the input picturesproject onto said given pixel location, giving increased weighting incontrolling the mix to the respective pixels of vectors formingsubstantially conjugate pairs.

The invention will now be described by way of example with reference tothe accompanying drawings, in which:

FIGS. 1 to 3 are diagrams illustrating the function of picture buildingin a typical motion compensated system; and

FIG. 4 is a diagram illustrating apparatus according to an embodiment ofthe invention.

FIG. 5 illustrates an exemplary signal processing operation.

In motion compensated standards conversion, the process of picturebuilding is typically important, the accuracy of the process greatlyaffecting the quality of the output images or pictures. The inputpictures are typically in the form of video fields or frames, though ofcourse, any type of input picture sequence may be employed in theembodiments described. Motion compensated picture building techniquesare known to the art, and therefore the basic principles will not bediscussed in detail here, though some description of the problemscommonly arising follows.

In a picture building procedure, as illustrated in FIG. 1, two inputpictures, in this case, two video frames (100 and 102) are used tocreate an output frame, indicated by dashed line 104. This output frameis to be created at a temporal position between the two input frames,though not necessarily equidistant from them.

In order to derive information illustrating the motion occurring betweeninput images of an image sequence, a motion measurement process (ofwhich the phase correlation technique is preferred) is performed on theinput images. The resulting motion vectors are assigned to pixels orgroups of pixels in the input image.

In the case illustrated in FIG. 1, vectors 106 and 108 have beenassigned to objects in the two input frames; vector 106 points forward(temporally) towards the output frame position, from a pixel (105) onthe first input frame (100), and vector 108 points backward from a pixel(107) in the second frame. The vectors are used to project the pixels(105, 107) from the input frames onto the pixel (110) of the outputframe which is currently being constructed. A decision is then taken asto which of the pixels to use, or what proportion of each pixel to usein a mix of the two.

The above example, however, is merely a simple case where a singlevector from each frame may be mapped to the required point. In othercases, there may not be a single vector, or there may be multiplevectors pointing to the output pixel position.

FIG. 2 illustrates one of these cases. Here, a vector (203) projects apixel (202) from the following frame to the output pixel position (204),but there are two vectors, 201 a and 201 b, pointing from differentpixels (200 a, 200 b) on the same, previous frame (100), to the sameoutput pixel position (204). This may indicate, for example, that oneobject is moving over another In the current video sequence. It can beseen that similar situations will arise with multiple vector “hits” fromeither side of the output position, and with any number of hits (greaterthan one).

FIG. 3 illustrates a different situation. Here, a vector (301) projectsa pixel (300) from the previous frame to the output pixel position(304), but there is no vector from the following frame.

In other cases there may not be a vector pointing to the output pointfrom either side, in which case there is simply a hole in the outputframe.

A prior method of picture building, as disclosed in EP 0,648,398,handles such situations in the following manner. If there is a singlevector his from one frame at the output pixel, the resulting projectionof the pixel from that frame is assigned a weighting value of 1. Ifthere is a double hit, each vector is given a weighting of 1, giving anoverall weighting for that frame or “side” (of the output position) of2. Greater numbers of hits increase the total weighting thus. However,if there is no vector hit, the “confidence” in that fame is taken aszero; this therefore prevents the eventual mix of the output pixeltaking any information from that frame or side which gave a zero hitresult.

The inventors have recognized that a more sophisticated treatment ofpicture building which measures where multiple and zero hits occur canbring significant benefit over this prior technique in the quality ofthe output pictures.

In embodiments, the invention provides a system which identifies theoccurrence of such “non-single hits” in the picture building process.The techniques described in the following apply the resulting counts tonew methods of picture building which give the previously unexpectedresult of greatly increasing output picture quality.

In one embodiment, if there is any number of hits, from either of theinput frames, which is not equal to one, the input from that frame issimply ignored. Thus in the case illustrated in FIG. 2, the input ofboth of the pixels 200 a and 200 b, projected by vectors 201 a and 201b, would be ignored. The only information taken for the output pixel 204would therefore be that provided by the following frame (102), frompixel 202 and vector 203. In the case illustrated in FIG. 3, the numberof hits from the following frame is zero (which is not equal to 1), sothat frame is ignored, and pixel 300 and vector 301 are used for theoutput pixel (304).

This method may also be implemented in a “softer” version. For example,where a multiple hit occurs, the system may nevertheless include someproportion, say 10%, of the offending vectors' source pixels inconstructing the output pixel. This would be of particular use in caseswhere there are no hits on one side, and multiple hits on the other; atleast some of the pixels from those vectors which would otherwise beignored may be used for the output pixel.

In most cases, the system will employ some sort of “fallback” mode, inorder to prevent failure, or allow a “hole” to appear in the outputframe where there are no hits from either side.

FIG. 4 is a schematic diagram of a video processing apparatus accordingto one embodiment of the invention in which an output frame isconstructed temporally intermediate two input frames. The previous frameand corresponding motion vectors are input to a forward projection stage402. The resulting frame is then processed by hole filler 404, whichfills small holes in the picture to produce a forwards projected framewhich is input to a first input of mixer 410. The motion vectors for theprevious frame are also input to a hit detector 408 which counts thenumber of motion vectors from the previous frame which point toward eachpixel location in the forwards projected frame, to produce a “No. ofhits” signal. This will tend to be a step or delta type function, and itis therefore passed to a processing stage 406 which produces, from the“No. of hits” a smoothly varying output, in order not to introduce sharpedging effects. This signal then acts as a “prediction of quality” forthe forwards projected frame.

An example of a process performed by stages 406 and 416 will now bedescribed briefly with reference to FIG. 5. A signal representing thenumber of hits is shown in FIG. 5 a. Portion 502 registers 2 hits whileextended portion 504 registers no hits. The rest of the signalrepresents a single hit. This signal is converted into the signal inFIG. 5 b which represents those portions of the signal having a singlehit as “high” and all other portions as “low”. In FIG. 5 c the signalhas been filtered to remove any very short variations such as that at506. Finally, in FIG. 5 d, any step edges are replaced by portions ofconstant slope providing a smoothly varying indication of quality, whichprovides a higher indication towards the edges of areas not having asingle hit, moving to a lowest indication of quality at the center ofsuch an area. In this example the slope is fitted to the signal in 5 csuch that the value of the ‘corners’ of the signal is maintained.

Returning now to FIG. 4, the next frame and corresponding motion vectorsare processed, in a similar fashion to the previous frame, by elements412, 414, 416 and 418 which are analogous to elements 402, 404, 406 and408, to produce a backwards projected frame, and a “prediction ofquality” for the backwards projected frame. The backwards projectedframe is passed to the second input of mixer 410, while the twoprediction of quality signals are input to comparison stage 420.Comparison stage 420 compares the prediction of quality signals for thetwo candidate frames input to mixer 410, and produces an output signalwhich controls the proportions of the candidate frames which are mixed,according to methods described previously.

The output from mixer 410 is passed to a first input of a further mixer422. The second input to mixer 422 is a “fall back frame” which isprovided by stage 424, which selects the input frame which is temporallyclosest to the output frame. Mixer 422 is controlled by controller 426which, similar to comparison stage 420, receives the two prediction ofquality signals for the respective forward and backward projectedcandidate frames. Controller 426 selects the greater of the two inputsignals which provides an overall prediction of quality for the outputof mixer 410. This overall prediction of quality signal is used tocontrol the proportions of input signals which are mixed at mixer 422 toproduce the output 424.

Thus the previous frame is forward projected, and the following frameback projected to an intermediate temporal location, and the projectionsare mixed in dependence upon measurements of the number of hits arisingon either side. Separate “predictions of quality”, dependent upon hitcount, are derived for the previous and following frames, and these arecompared to control the projection mix. For example, if a single hit isregistered for a given pixel, the PoQ is high, whereas if a zero ormultiple hit are registered, the PoQ is low.

In an alternative embodiment, the median of all vectors pointing to agiven pixel on the output frame is taken. A number of options are thenavailable: the closest vector to the median is taken, and the othervectors rejected; in a case where there is simply a double hit on oneside, the offending vector is rejected as an outlier, as the other twovectors are closer to the median; fractions of the various vectors aretaken, according to their proximity to the median. These approaches maybe effective in cases where a plurality of spurious vectors produce themultiple hits.

In a further embodiment, the confidence assigned to the vector hits onone “side” of the output frame position is normalised. Thus if there isa double hit on one side, the contribution to the mix may be ¼ of eachpixel in the double hit, and ½ of the pixel on the other side.

In a still further embodiment, where there are multiple hits, the vectoron the “multiple hit side” are compared with those on the other side. Ifone vector is the conjugate (or near conjugate) of one of the vectors onthe other side, as in FIG. 2, vectors 201 b and 203, then the othervector, 201 a, is discarded. Essentially, the only vectors taken for thedecision on mixing the output pixel are such conjugate pairs, as thesematch the flow of the vector field along the current sequence.

In the embodiments described above, hit counts are generally describedas integer values. In alternatives, if a phase correlation process isimplemented to sub-pixel accuracy, then a more sophisticated approach ispossible. The hit count becomes an accumulation over an area ofnon-integer hit values, rather than a simple count of vectors pointingto an integer value. Such “soft” hit counts may be processed as in anyof the preceding methods in order to provide an output pixel.

In general, certain fallback options are required where zero hits orspurious vectors occur. For example, if vectors on either side producean inequality or disagreement, the system may take the vector from theclosest frame to the output temporal position. Where the hit count iszero on both sides, “holes” occur in the output frame. In such cases,“hole filling” or copying of pixels from either frame may beimplemented. In other cases, the system may use the fallback picture, asin FIG. 4.

In the above description of certain embodiments of the invention, theexample of the projection of two input pictures onto an output picturelocation is used. It should be noted that aspects of the invention areequally applicable to techniques in which more than two input pictures,and their respective pixels and assigned vectors, are used to create theoutput picture. Here, notwithstanding the methods described forweighting pixels in particular ways, the proportions of pixels used inthe final mix may depend to a greater extent upon the distance of theinput picture in question from the temporal location of the outputpicture.

It will be appreciated by those skilled in the art that the inventionhas been described by way of example only, and that a wide variety ofalternative approaches may be adopted. In particular, the variousmethods described may be used in conjunction, in a variety ofadvantageous combinations.

1. A method of motion compensated combination of a plurality of picturesof an input picture sequence to form an output picture at a temporallocation between two of the input pictures, the method comprising:projecting input pixels from the input pictures to locations on theoutput picture using motion vectors assigned to those input pixels;counting the number of vectors from each input picture which point to agiven pixel location on the output picture; and employing this count incontrolling the mix of the pixels projected by those vectors used toproduce the output pixel at the given pixel location.
 2. A methodaccording to claim 1, comprising employing a non-linear function of thecount in controlling said mix.
 3. A method according to claim 1,comprising, where a plurality of vectors from one of the input picturespoint to the given pixel location, assigning a lower weight to therespective pixels of those vectors from that input picture forconstruction of the pixel at the given location.
 4. A method accordingto claim 1, comprising, where a plurality of vectors point to the givenpixel location, taking a median of the vectors, and using the vectorclosest to the median for construction of the output pixel.
 5. A methodaccording to claim 1, comprising, where a plurality of vectors from oneof the input pictures point to the given pixel location, using anaverage of the respective pixels of those vectors as the contribution tothe output pixel from that input picture.
 6. A method of motioncompensated combination of a plurality of pictures of an input picturesequence to form an output picture at a temporal location between two ofthe input pictures, the method comprising: projecting input pixels fromthe input pictures to locations on the output picture using motionvectors assigned to those input pixels; and mixing the respective pixelsprojected by the vectors onto the output picture to produce an outputpixel at a given location, wherein, where a plurality of vectors fromone of the input pictures project onto said given pixel location, givingincreased weighting in controlling the mix to the respective pixels ofvectors forming substantially conjugate pairs.
 7. Video processingapparatus for forming an output picture at a selected temporal locationfrom a sequence of input pictures having associated motion vectors, theapparatus comprising: a temporal picture projector for projecting inputpictures to the temporal location of the output picture using the motionvectors associated respectively with said input pictures, to formprojected pictures; a counter for counting the number of motion vectorsfrom the input pictures pointing towards each pixel of the respectiveprojected picture for each of the input pictures; and a first mixer formixing the projected pictures, adapted to mix the pixels of projectedpictures in varying proportions, such that at each pixel in the mix therelative proportion from each candidate picture is dependent on thenumber of motion vectors from the respective input picture pointingtowards the spatial location of that pixel.
 8. Apparatus according toclaim 7, including a processor receiving from the counter, for eachinput picture, a signal representing the number of motion vectorspointing towards each pixel location, and processing this signal toproduce, for each projected picture, a smoothed prediction of qualitysignal which is passed to the first mixer to control the mixing ofcandidate pictures.
 9. Apparatus according to claim 8, furthercomprising a second mixer which receives as its inputs the output of thefirst mixer and a selected one of the input pictures, adapted to mix itsinputs in varying proportions according to an overall prediction ofquality signal derived from the prediction of quality signals for eachcandidate picture.
 10. Apparatus according to claim 9, wherein theselected one of the input pictures is the picture temporally closest tothe temporal location of the output picture.